A Tool for Semi-Automatic Generation and Maintenance of Taxonomies from Semi-Structured Documents

نویسنده

  • Marcello Leida
چکیده

This chapter introduces OntoExtractor, a tool for the semi-automatic generation of the taxonomy from a set of documents or data sources. The tool generates the taxonomy in a bottom-up fashion. Starting from structural analysis of the documents, it produces a set of clusters, which can be refined by a further grouping created by content analysis. Metadata describing the content of each cluster is automatically generated and analysed by the tool for producing the final taxonomy. A simulation of a tool, based on an implicit and explicit voting mechanism, for the maintenance of the taxonomy is also described. The author depicts a system that can be used to generate the taxonomy from a heterogeneous source of information, using wrappers for converting the original format of the document to a structured one. This way, OntoExtractor can virtually generate the taxonomy from any source of information just adding the proper wrapper. Moreover, the trust mechanism allows a reliable method for maintaining the taxonomy and for overcoming the unavoidable generation of wrong classes in the taxonomy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Frame-Based System for Automatic Classification of Semi- Structured Data

The problem of data classification goes back to the definition of taxonomies covering knowledge areas. With the advent of the Web, the amount of data available increased several orders of magnitude, making manual data classification impossible. This work presents a tool to automatically classify semi-structured data, represented by frames, without any previous knowledge about structured classes...

متن کامل

Semi-Automatic Wrapper Generation for Commercial Web Sources

Semi-automatic wrapper generation tools aim to ease the task of building structured views over semi-structured web sources. But the wrapper generation techniques presented up to date are unable to properly deal with sources requiring complex navigational sequences for accessing data. In this paper, we present Wargo, a semi-automatic wrapper generation tool, which has been used by non-programmer...

متن کامل

A Semi Automatic Tool For Schema Mapping

neric mapping framework at the schema level to address the problem of schema interoperability Providing a formalism for developing a generic, extensible, and semi-automated mapping A semi-automatic tool for schema mapping. at the University of Washington in Seattle, where he founded the database group. on Clio, the first semi-automatic tool for heterogeneous schema mapping. Keywords: data integ...

متن کامل

JIS 28/2 00 prelims

Ontology is an important emerging discipline that has the huge potential to improve information organization, management and understanding. It has a crucial role to play in enabling content-based access, interoperability, communications, and providing qualitatively new levels of services on the next wave of web transformation in the form of the Semantic Web. The issues pertaining to ontology ge...

متن کامل

The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes

Semi-automatic wrapper generation tools aim to ease the task of building structured views over web sources. But the wrapper generation techniques presented up to date show several weaknesses when dealing with the complex commercial web sources of today, specially when constructing advanced navigational sequences for accessing data. We present Wargo, a semi-automatic wrapper generation tool, whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015